SITE LINK

KMID : 1022420150070040003

Phonetics and Speech Sciences
2015 Volume.7 No. 4 p.3 ~ p.9

Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model

Kim Kwang-Ho

Lee Dong-Hyun
Lim Min-Kyu
Kim Ji-Hwan

Abstract

In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google’s Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-|V| coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).

KEYWORD

deep neural network, language model, continuous word vector, input dimension reduction

FullTexts / Linksout information

Listed journal information

site infomation

Prohibition of Unauthorized Collection of E-mail Addresses, medric.kyung@gmail.com
N4 301, Chungbuk National University, Chungdae-ro 1, Seowon-Gu, Cheongju, Chungbuk 28644, Korea